Design of the DEC LANcontroller 400 Adapter
نویسندگان
چکیده
the XMI bus either as The DEC LANcontroller the system bus (VAX 6000 400, Digital's XMI-tosystems) or as an I/O Ethernet adapter (DEMNA), bus (VAX 9000 systems). connects systems based It is an intelligent on the Digital XMI bus adapter that implements to an Ethernet/IEEE 802.3 the physical layer and part local area network (LAN). of the data link layer of These systems use the XMI network protocol. The term bus either as the system intelligent refers to the bus (VAX 6000 systems) packet processing performed or as an I/O bus (VAX by the adapter as part of 9000 systems). The new the data link layer. systems, which can utilize The DEMNA adapter was the full bandwidth of the needed to support the I/O Ethernet, are characterized requirements of the VAX by increased host processor 6000 and VAX 9000 systems, speeds. The DEMNA adapter which can utilize the full was designed to support bandwidth of the Ethernet. these I/O requirements. The adapter also provides In addition, console and the ability to configure monitor facilities were these systems without a built into the adapter BI bus. For these systems, firmware for debugging, the DEMNA adapter is the verification, and user only Ethernet connection visibility. The adapter's available. performance for small The DEMNA adapter is packets exceeds system controlled by a port capabilities, and Ethernet driver that resides in bandwidth is the limiting host memory. The interface factor for large packets. between the port driver The high-performance and the DEMNA firmware DEC LANcontroller 400, (the port) is a ring-based Digital's XMI-to-Ethernet design which is optimized adapter (DEMNA), connects for low system overhead and a system based on the high performance. Digital XMI bus to an The DEMNA adapter has the Ethernet/IEEE 802.3 local following major features: area network (LAN). This adapter is intended for o Supports Ethernet/IEEE Digital systems that use 802.3 protocols Digital Technical Journal Vol. 3 No. 3 Summer 1991 1 Design of the DEC LANcontroller 400 Adapter o Supports up to 64 users (each one a separate protocol such as local This paper begins with a area transport [LAT] logic overview of the DEMNA software, DECnet network device. The sections that software, or clusters) follow discuss the factors o Supports two modes of that influenced design and addressing: VAX virtual implementation, describe addressing and 40-bit the major performance physical addressing metrics and user visibility operations, and review the o Allows buffer chaining design results and future on transmit needs. o Performs packet Logic Overview filtering and validation The DEMNA adapter is a on receive single-board XMI adapter o Supports Digital's based on complementary maintenance operations metal-oxide semiconductor protocol (MOP) functions /transistor transistor o Provides support for logic (CMOS/TTL) diagnostic routines and technology. As shown in field service functions Figure 1, the hardware implemented through consists of four separate the system console or subsystems: diagnostic software o Microprocessor o Has console and monitor o Direct memory access facilities that allow a (DMA) and shared memory console user to monitor o XMI interface DEMNA operation and network utilization o Ethernet The microprocessor but is copied to RAM for subsystem contains the execution. The boot ROM CMOS VAX (CVAX) processor, contains the initialization system support chip (SSC), code and diagnostics. This boot read-only memory subsystem also provides a (ROM), Ethernet address console interface through programmable read-only the SSC for diagnostics, memory (PROM), electrically module debugging, and erasable programmable readnetwork monitoring. only memory (EEPROM), The DMA and shared memory and random-access memory subsystem provides the (RAM). The microprocessor means of communication subsystem provides an between the CVAX processor internal, high-speed and the other subsystems. CDAL bus so that the CVAX The devices arbitrating processor can fetch its for this shared memory are instructions and execute the CVAX processor, the them without being delayed gate array, and the Local by the other controllers on Area Network Controller for the module. The firmware Ethernet (LANCE) chip. is stored in EEPROM, 2 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Design of the DEC LANcontroller 400 Adapter The XMI interface subsystem o Deliver high contains the XMI network performance, measured by adapter (XNA) gate array the amount of Ethernet and the XMI corner. The bandwidth supported at XNA gate array is the datavarious packet sizes, move engine for the DEMNA with minimized host adapter and contains all overhead the XMI-required registers. o Supply debugging The Ethernet subsystem features for design contains the LANCE chip, verification and field the serial interface maintenance of the adapter (SIA) chip, and adapter various bus interface First, we reviewed previous logic modules. The Ethernet adapters to determine subsystem receives packets what improvements could from the Ethernet and be made. We learned that stores them in the shared a complex host interface memory. When transmitting complicated host software a packet on the Ethernet, and adapter firmware the LANCE chip gets the and greatly affected packets from shared memory performance. One of these and transmits them on the adapters, the Digital BI Ethernet. Ethernet Network Adapter (DEBNA), implemented a Design generic port interface that The design of the DEMNA used interlocked queues adapter was influenced by containing a queue entry many factors, including with a buffer name that previous adapter design indexed into a buffer experiences, available descriptor table (i.e., hardware such as Ethernet an additional level of chips, and system indirection). In addition requirements. The DEMNA to the firmware complexity, team was assigned the the hardware was not well following tasks: suited to a complex port interface. o Produce a working Another area in which Ethernet adapter improvements could be made that could be used by over previous Ethernet operating systems such adapters was the amount as VMS, ULTRIX, ELN, of processing performed by and custom operating the host processor during systems on hardware receive packet filtering, configurations that use address translation, and the XMI bus as a system buffer copies. Overall bus or an I/O bus system performance improves if this processing can be reduced by performing part or all of these functions in the adapter. This difference transforms the Digital Technical Journal Vol. 3 No. 3 Summer 1991 3 Design of the DEC LANcontroller 400 Adapter adapter from a dumb adapter We designed a simple (much of the data link host interface, using processing performed by the rings instead of queues. host) to an intelligent Interrupts to the host adapter (much of the were kept to a minimum, processing performed by from one interrupt per the adapter). packet at light loads to The results of our analysis a fraction of that number of older Ethernet adapters under heavy loads. As seen led us to choose a design in Figure 2, the port and that employs a simple the port driver (host) host interface, offshare the following data loads the host whenever structures, which reside in possible, uses rings host memory: instead of queues, and o Port data block. This supplies the address of structure gives the the buffer directly with port the location of the the ring entry rather than rings and page tables indirectly through another in host memory and is data structure. a repository for error The design of the adapter information. was now consistent with the o Command and receive needs of the new VAX 6000 rings. These rings and VAX 9000 systems. These contain information systems, characterized by describing outstanding increased host processor command and transmit speeds, needed increased requests and buffer I/O performance. The task information for receive of the DEMNA team was to buffers. fill that need for Ethernet o Transmit, receive, and I/O. command buffers. These Type of Adapter buffers contain packet data and command data. The DEMNA product is a These data structures store-and-forward adapter, constitute the primary i.e., it copies data to means of communication and and from host memory by data transfer between the way of temporary storage port and the port driver. on the adapter. This data Control status registers transmission differs from (CSRs) are provided for that of a cut-through port poll demand registers, adapter in which data XMI context, and port flows directly between host initialization. memory and the transmission medium. However, the DEMNA Two rings are used in the adapter is actually able to host interface: the command gain some of the benefits ring and the receive ring. of cut-through on the Each ring consists of receive side. 1024 bytes of physically Host Interface contiguous memory, and each ring contains entries that 4 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Design of the DEC LANcontroller 400 Adapter describe a buffer or a set of buffer segments (when chaining transmit buffers). The number of entries in the receive ring is fixed, since each entry points to a single contiguous buffer. The size of each transmit ring entry is variable and is fixed at initialization time. The port and port driver process the entries in each ring in sequential order, starting with the first entry. A ring entry can be processed only by its owner. When the last entry in the ring is reached, processing starts again with the first entry. Digital Technical Journal Vol. 3 No. 3 Summer 1991 5 Design of the DEC LANcontroller 400 Adapter Host interrupts are CVAX RAM (used by the CVAX minimized by using a ring processor exclusively) release function, which consists of 256 kilobytes counts the number of ring and contains the firmware entries processed for and data structures (the completion by the port firmware is copied to RAM and the port driver. The during self test). Smaller port driver counts the RAMs would have been number of completed entries slightly less expensive but and writes this count to would have complicated the a completion CSR when it firmware update procedure has finished processing and limited the ability all the completed transmit of the firmware to use and receive ring entries. the large data structures The port maintains the same needed for receive packet count and issues another filtering. interrupt whenever it sees Shared RAM (shared by the that its count and the CVAX processor and the count last written by the LANCE chip) consists of port driver are different. another 256 kilobytes. This This function ensures RAM contains the transmit that the port driver is and receive buffers as well interrupted only when it as the LANCE transmit and stops processing the rings receive rings. There is a because there is nothing vast amount of buffering else to process. The port space here, so the DEMNA driver can process multiple device can tolerate a completed transmits and considerable amount of receives after each inattention from the host interrupt as well. Thus, before being forced to no spurious interrupts discard incoming receive are issued and the number packets. of interrupts is reduced Erasable programmable by processing multiple read-only memory (EPROM) completions at once. consists of 128K bytes for Adapter Design diagnostics and firmware The firmware is written boot code, including a in VAX MACRO code. An backup copy of sufficient alternative was to use operational firmware MACRO for the transmit to allow an update of and receive paths and a EEPROM for initial load or higher-level language for subsequent update. EEPROM initialization, shutdown, consists of 64K bytes for and error handling. operational firmware, However, this approach diagnostic patches, and was not chosen because it error history data. complicates the interface The gate array (data mover) and would have resulted in handles the data move firmware size difficulties. and quadword read/write operations. The data-move operations transfer buffers 6 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Design of the DEC LANcontroller 400 Adapter between the host and shared requests directly to the RAM. The quadword read adapter. /write operations are used For ULTRIX systems, the for control functions, such driver runs at a lower as reading ring entries, level with respect to reading address translation packet filtering so it information, and writing cannot take advantage of ring status on completion. this feature. However, Once the firmware initiates buffer chaining is used a data-move operation, on the transmit side. As a other work is performed by transmit request traverses the firmware while the data the various software move progresses. layers, it accumulates Interrupts are very buffer segments which the costly; therefore, we driver has to concatenate chose to limit the number into a transmit frame. of interrupts fielded To avoid buffer copies by the CVAX processor. in all but the extreme A LANCE interrupt costs and infrequent cases, the CVAX interrupt overhead, driver then passes up to plus a LANCE CSR access, 11 buffer segments to the plus some normal interrupt adapter. overhead to save and To allow customer-written restore registers. A datadrivers for special move interrupt is less applications, we documented costly, but the firmware the interface to make can be coded so that the it readily available to data-move operation is customers. usually complete, thus eliminating the need for Debug Tools the interrupt. Polling is The adapter has a very performed for all LANCEsimple mission in life: and data-move-related to transmit and receive functions, but interrupts packets. To verify are used for local console operation, some debug I/O and error events. tools are needed. The Driver Design goal for the DEMNA team The DEMNA team needed to was to provide extensive design a driver that would debug tools both in the be compatible with existing operational firmware and drivers but that would use in standalone user tools. all the features provided This design would allow by the adapter. For VMS debugging and verification systems, this meant using in the development lab and the set of common routines in other, less-controlled that provide much of the environments. These debug data link functionality of tools are discussed further the driver, but avoiding in the Visibility section. packet filtering. Another goal was to limit the copying of data by passing Digital Technical Journal Vol. 3 No. 3 Summer 1991 7 Design of the DEC LANcontroller 400 Adapter Implementation The command ring usually This section describes the contains transmit buffers, implementation of the DEMNA which can contain commands adapter through its major for special functions. functional blocks: These commands are included in the command ring to o Scheduler allow the port driver o Port processing to synchronize control o Command processing requests with transmit requests, e.g., user o Transmit task startup and stopping. o Receive task Command processing routines o Console task are called by the transmit o Monitor task task after the command buffer has been read Scheduler from host memory. The The scheduler is a roundcommands consist of user robin routine that simply startup (consisting of user checks for work, does it, context such as protocol checks for work, does it, type, packet format, etc. There are no context physical address to use, switches, but some context and multicast addresses is maintained in registers to enable), user stopping, and shared by all routines. read counters, and a set of The scheduler, when idle, maintenance commands. consists of about 18 Transmit Task VAX MACRO instructions. The transmit task copies Transmit and receive a packet from the host tasks are given higher memory to adapter buffer priority by duplicating memory and tells the their scheduler entry. When LANCE to transmit it not idle, one pass of the onto the Ethernet (store scheduler processes four and forward). After the packets. LANCE has completed the Port Processing request, the firmware Port processing controls writes transmit status adapter initialization to the command ring entry, and shutdown, LANCE signifying completion of initialization and restart, the transmit. fatal adapter error To minimize service handling, gate array error time, the code in the handling, and miscellaneous transmit path was carefully host interface functions. scrutinized. The number This task also handles of checks and branches firmware updates of EEPROM. was minimized for the Command Processing optimized path. The optimized path through the transmit code is the 30-bit virtual addressing path, which is the most used. However, the 40-bit 8 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Design of the DEC LANcontroller 400 Adapter physical addressing path in small groups (192 bytes) still results in better to allow the benefit of throughput because this cut-through on larger path does not require any packets. address translations, which Packet filtering is done are timely. The instruction for the destination address sizes were shortened when and for user type, either possible, using word protocol type for Ethernet, instructions instead of destination service access longword instructions, point (DSAP) field for 802, to reduce the amount of and protocol identifier instruction prefetch by the (PID) value for 802 CVAX processor. Routines subnetwork access protocol were placed on quadword (SNAP) packets. Additional boundaries to maximize filtering is done for cache efficiency. When users who request all waiting for data moves traffic or all multicast to complete (getting traffic. Filtering is the transmit buffer from done by maintaining a host memory) or obtaining 64-bit user mask, which address translation accumulates the list of information from the host, users who want a copy of the firmware was designed the packet according to to perform other functions the characteristics of the to increase the probability packet and what each user that the operation would be has requested. performed when the firmware needed it. Packet validation consists Receive Task of length checks for Ethernet frames (if the The receive task has the user is using a length simple job of handing field after the protocol received packets to the type) and for 802 frames. port driver. This task This saves the driver a is complicated by the little work. Additionally, need to off-load the users can request only host of part of receive packets smaller than a processing (including selected size; the adapter packet filtering, packet discards packets that validation, maintenance of exceed this size. counters, and processing The cut-through feature MOP messages) and to make adds complexity and duplicates of packets when reduces throughput on more than one user has small packets, but provides requested a copy. It is many benefits for larger further complicated by the packets. When a packet need to provide buffering, larger than 192 bytes which the port driver uses is received, the packet to prevent the driver from filtering and validation supplying large numbers of all but the length is of buffers. For enhanced done for the first segment. performance, the firmware This segment is then copied deals with receive packets Digital Technical Journal Vol. 3 No. 3 Summer 1991 9 Design of the DEC LANcontroller 400 Adapter into the host buffer, and where it is formatted and subsequent segments are displayed on the screen. copied appropriately. The Due to code size last segment completes limitations in the EEPROM, the packet validation and compressed versions of cyclic redundancy check the console screens are (CRC). The difficulty stored in the EEPROM. At occurs when the packet initialization time the validation fails or an screens are uncompressed error is detected, because and stored in the RAM. the packet is discarded (The screen compression and the context for the saved 5 kilobytes in the now-free receive buffer EEPROM.) To easily setup has to be restored. The and maintain the screens, firmware elects to save especially since they often as little context as changed during the project, possible for each packet the screens were set up and to regenerate buffer in separate text files. context after the error, The fields in the screen i.e., fetching the ring were coded with different descriptor anew and redoing data types, such as date the address translation. or longword. The screen was Console Task then put through a PASCAL The console task accepts program to convert it to and parses console commands a VAX MACRO data structure and displays the requested and compress it. data. There are two means The local console and the of accessing the console: remote console can be run local and remote. The local simultaneously. They have console is accessed by a separate input and output terminal connected directly buffers, the same decode to the DEMNA adapter. The and formatting code, and remote console is accessed different input and output through MOP console carrier methods. commands directed at the The remote console uses adapter from another the MOP console carrier, system. A remote console coming in on transmit or may also be used to access receive. The command/poll a DEMNA device on the and response/acknowledge local system (coming in commands are sent by through transmit instead of the MOP program, i.e., receive). The firmware either the network control does not distinguish program (NCP) or a user between transmit or receive program that implements operations from remote the MOP console carrier. consoles. The console block The console code extracts accepts the commands and the input characters from decodes them, and the the command/poll packet monitor block determines and returns a response the status. The monitor /acknowledge packet with block passes this status any available data from back to the console block 10 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Design of the DEC LANcontroller 400 Adapter the remote console output Performance buffer. When a command has As stated previously, the been entirely received, primary goal of the DEMNA it is decoded and executed adapter was high-speed and the response placed in performance, i.e., this the remote console output adapter would not create buffer, which is sent back a bottleneck when placed to the user in response in a system. The major /acknowledge packets. performance metrics we The local console is a identified were throughput, terminal directly connected service time, latency, and to the DEMNA device and reliability. interfaced through the SSC o Throughput is the number universal asynchronous of packets or bytes of receiver transmitter packet data that can be (UART). This terminal transmitted or received connection receives and per unit of time. transmits one character at a time. Characters o Service time is the time are collected into the a packet spends in each local console input buffer stage along its path and complete commands from source through host are parsed and executed. software and driver, Response data is placed in through adapter, over the local console output wire, through adapter, buffer. The local console and through driver and uses interrupts to signal host software to the when a character has been destination. typed or when the UART is o Latency is another ready to transmit another measure of service character. These are the time. It is a measure only interrupts used on the of delays encountered module, except for error by queue depths of more interrupts. Since console than one at various interrupts are relatively points. infrequent, they are less o Reliability is measured costly than polling. as the probability Monitor Task of packet loss under The monitor facility a receive load. It operates mainly during is also measured as receive or transmit. It adapter buffering and also runs as a low priority host buffer allocation entry in the scheduler to effectiveness. For some deal with debugging and protocols, recovery verification activities from packet loss takes (when debugging firmware is a significant amount enabled). of time, and the loss of a packet may be quite noticeable to a user. Hence, recovery is related to a user's Digital Technical Journal Vol. 3 No. 3 Summer 1991 11 Design of the DEC LANcontroller 400 Adapter perception of reliable The primary hardware operation. factors influencing The performance goal of the adapter performance are DEMNA team was to minimize CVAX performance, DMA the service time through engine throughput, and bus the adapter to maximize contention. throughput. This is most The gate array DMA engine critical for small packet can sustain between 11.5 sizes. If the service time and 13.5 megabytes per is greater than the time second on a VAX 6000 it takes to transmit or system. When transferring receive a packet, then packet data (and attendant queue depths increase, host ring processing), the increasing latency for firmware can sustain about subsequent packets. Small 5.8 megabytes per second. packets are critical This is the approximate because, obviously, they rate at which the firmware take less time to transmit would deliver a burst of or receive. large packets that had been The speed of the Ethernet stalled due to a lack of wire and the XMI bus must receive buffers. also be considered. The The CVAX chip used is the Ethernet operates at 60-microsecond variant (the 10 megabits per second. same one used in the VAX The available bandwidth 6000 Model 310 processor). into memory and the As seen in Figure 1, the capacity of the XMI are processor runs on its much greater; thus, the own internal CDAL bus Ethernet is the limiting which has RAM containing factor. To maintain maximum firmware and private throughput, the DEMNA data structures. Thus the device must write and read processor does not contend packets to and from host for the same bus as the memory at a speed equal gate array and the LANCE to or greater than the chip. However, the CVAX Ethernet wire. If this processor does touch shared speed is obtained, then memory and gate array the service time of the registers; therefore the DEMNA adapter must be less possibility of contention than the time it takes to is significant. Logic transmit or receive one 64analyzer measurements byte (small) packet to or indicate that about 14 from the Ethernet wire to percent of CVAX cycles maintain maximum throughput are consumed while waiting at all packet sizes. for access to the shared Hardware memory bus for minimum size packets. For large packets the consumption is 33 percent, but the cycles needed are considerably less than the remainder. The effect on the gate 12 Digital Technical Journal Vol. 3 No. 3 Summer 1991 Design of the DEC LANcontroller 400 Adapter array accounts for part 450 bytes per packet for of the difference between a mix of DECnet, LAT, the speeds of 11.5 to 13.5 and cluster traffic. megabytes per second and Table 1 represents the of the 5.8 megabytes per throughput that the host second mentioned above. software can see, given Firmware sufficient host computes. These numbers show what Throughput is limited by might be expected. Virtual the Ethernet bandwidth addressing costs some for packet sizes greater performance, and receive than 88 bytes. The average filtering accounts for most packet size on Ethernet of the difference between is approximately 150 to transmit and receive.
منابع مشابه
The Architecture and Implementation of a High-performance FDDI Adapter
The architecture and With the advent of fiber implementation presented distributed data interface in this paper are for the (FDDI) technology, Digital DEC FDDIcontroller 400, saw the need to define Digital's high-performance, an architecture for a XMI-to-FDDI adapter known high-performance adapter as DEMFA. This adapter that could transmit data provides an interface 30 times faster than between...
متن کاملPerformance Analysis of a High-Speed FDDI Adapter
adopting fiber distributed The DEC FDDIcontroller data interface (FDDI) 400 host-to-FDDI network local area network (LAN) adapter implements technology as a followreal-time processing on to Ethernet, Digital functionality in hardware, recognized the need to unlike conventional build an industry-leading microprocessor-based network adapter to service designs. To develop its high-performance this...
متن کاملA dynamic-translinear fully-integrated highly-directional hearing aid adapter
A directional hearing aid adapter was designed and implemented using Dynamic Translinear (DTL) circuit techniques. The signal-processing core was optimized to yield minimum current consumption for the specified dynamic range. In this paper the design and implementation of the core is presented. It consumes a current of 40 μA at a supply voltage of down to 1.0 V having a total integrated capacit...
متن کاملTolerability and efficacy of single dose albendazole, diethylcarbamazine citrate (DEC) or co-administration of albendazole with DEC in the clearance of Wuchereria bancrofti in asymptomatic microfilaraemic volunteers in Pondicherry, South India: a hospital-based study
BACKGROUND: The tolerability and efficacy of single dose albendazole (400 mg), diethylcarbamazine citrate (DEC) (6 mg/kg bodyweight) or co-administration of albendazole (400 mg) + DEC (6 mg/kg bodyweight) was studied in 54 asymptomatic Wuchereria bancrofti microfilaraemic volunteers in a double blind hospital-based clinical study. RESULTS: There was no significant difference in the overall inci...
متن کاملAdverse reactions of 300 MG diethylcarbamazine, and in a combination of 400 MG albendazole, for a mass annual single dose treatment, in migrant workers in Phang Nga province.
BACKGROUND Foreign migrant workers with work permits in Thailand are given once a year 300 mg diethyl-carbamazine (DEC) for bancroftian filariasis, and 400 mg albendazole (ABZ) for helminthiasis. Treatment effectiveness, tolerability, and safety of two treatment arms, DEC + ABZ and DEC alone, had never been fully documented. OBJECTIVE Evaluate the tolerability of the two treatment arms and an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Digital Technical Journal
دوره 3 شماره
صفحات -
تاریخ انتشار 1991